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Background/context: 

Description of prior research, its intellectual context and its policy context. 



Included in the recent push for rigorous evaluations of programs that are effective in 
supporting students’ learning and achievement is an emphasis on measuring implementation 
fidelity and linking those measures to program impacts. Claims of treatment effectiveness may 
be unjustified and invalid unless the degree to which programs are implemented as intended is 
defined and assessed. However, despite this emphasis on measuring implementation fidelity, 
recent reviews of studies in school settings have illustrated that many inconsistencies and 
omissions in measuring fidelity exist (Dusenbury, 2003; O’Donnell, 2008; Ruiz-Primo, 2005). 
Furthermore, little is known regarding the feasibility of conducting studies of implementation 
fidelity of unscripted interventions, where measuring fidelity first requires the identification and 
operationalization of complex, subtle facets of the intervention (Cordray & Pion, 2006). The 
field is in need of progress in this area to capitalize on the potential of fidelity studies to identify 
the most problematic areas of implementing a program and to provide feedback to developers for 
refining a program (Ruiz-Primo, 2005). 

Purpose / objective / research question / focus of study: 

Description of what the research focused on and why. 



In this paper, we describe a case of measuring implementation fidelity within an 
evaluation of Math Recovery (MR), a pullout tutoring program for low-achieving first-graders. 
We use this case to address two aspects of implementation fidelity studies: 1) their feasibility 
with respect to unscripted interventions, and 2) their relationship to ongoing program 
development. The aim of the MR program is to use children’s current understandings of number 
as bases for providing instruction that will support them in constructing increasingly 
sophisticated strategies. Therefore, assessing fidelity in this case is not as simple as monitoring 
adherence to a script, but requires assessing the complex practice of delivering mathematics 
instruction attuned to a child’s current understanding and needs. 

Our intentions were to both measure the extent to which the program was implemented as 
intended, and link the measures to student outcomes. Determining the extent to which the 
tutoring is enacted as intended requires an explication of ‘good’ tutoring as defined by the 
developers and systematically evaluating tutors’ practices against that ideal. However, we also 
go beyond MR’s notion of 'good' tutoring by looking for instances of "positive infidelity" 
(Cordray & Hulleman, 2009) within tutoring sessions. Thus, we view studies of implementation 
fidelity as potential sources for refining theory and program design. 

Setting: 

Description of where the research took place. 



The two-year evaluation of Math Recovery was conducted in 20 elementary schools (five 
urban, ten suburban and five rural), representing five districts in two states. Each was a ‘fresh 
site’ in that the program was implemented for the first time for the purposes of the study. 
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Population / Participants / Subjects: 

Description of participants in the study: who (or what) how many, key features (or characteristics). 



Students were selected for participation at the start of first grade based on their 
performance on MR’s screening interview and follow-up assessment interview. Eighteen 
teachers were recruited to receive training and participate as MR tutors from the participating 
districts — all of whom had at least two years of classroom teaching experience. Sixteen of the 
tutors received half-time teaching releases to serve one school each; two of the tutors served two 
schools each. All tutoring positions were underwritten by their respective school districts. 

Intervention / Program / Practice: 

Description of the intervention, program or practice, including details of administration and duration. 



To begin, the MR tutor conducts an extensive video-recorded assessment interview with 
each child identified as eligible for the program. The tutor analyzes these video-recordings to 
develop a detailed profile of each child’s knowledge of the central aspects of arithmetic using the 
MR Learning Framework, which provides information about student responses in terms of levels 
of sophistication. The one-to-one tutoring that follows is diagnostic in nature and focuses 
instruction at the current limits of each child’s arithmetical reasoning. Each selected child 
receives 4-5 one-to-one tutoring sessions of 30 minutes each week for approximately 1 1 weeks. 
Every lesson is video-recorded for purposes of daily reflection and planning. The tutor’s 
selection of tasks for sessions with a particular child is initially informed by the assessment 
interview and then by ongoing assessments based on the student’s responses to prior 
instructional tasks. The Learning Framework that the tutor uses to analyze student performance 
is linked to the MR Instructional Framework that describes a range of instructional tasks 
organized by the level of sophistication of the students' reasoning together with detailed guidance 
for the tutor. 

Guiding the fidelity assessment were what we, in collaboration with program developers, 
determined to be the unique aspects of Math Recovery tutoring as compared to typical tutoring: 
(a) the tutor’s ongoing assessment of the child’s thinking and strategies (both reflective 
assessment between tutoring sessions and in-the-moment assessment); and (b) the tutor’s efforts 
to provide instruction within the child’s zone of proximal development. 

Research Design: 

Description of research design (e.g., qualitative case study, quasi-experimental design, secondary analysis, analytic 
essay, randomized field trial). 

The larger evaluation study was a randomized field trial. In each year (2007-08 and 2008- 
09 academic years), 17 to 36 students deemed eligible (based on an initial MR screening) from 
each of 20 schools were randomly assigned to one of three tutoring cohorts or to the “wait list” 
for MR. The cohorts, consisting of three students each, were staggered across different start dates 
(i.e., Cohort A — September, B — December, C — March). In both years students on the randomly 
ordered waiting list were selected to join an MR tutoring cohort if an assigned participant left the 
school or were deemed “ineligible” due to a special education placement. The number of study 
participants totaled 517 in Year 1 and 510 in Year 2, of which 172 received tutoring in Year 1 
and 171 received tutoring in Year 2. Consistent with typical MR practice, all sessions were 
video-recorded. 
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In this paper, we focus our attention on the implementation fidelity component of the 
project, for which we followed the five-step method outlined by Cordray (2007): 1) describe the 
intervention change and logic models; 2) create indices to measure the fidelity of the 
implemented intervention to constructs identified in the models; 3) determine the reliability and 
validity of indices; 4) combine indices in the analysis; and 5) link fidelity measures to outcomes. 
Here, we report primarily on step three, the process of determining the reliability and validity of 
fidelity indices. 

At the outset, we included program developers in conversations to identify key 
implementation components (Fixsen, Naoom, Blase, Friedman, & Wallace, 2005) and initial 
schemes for measuring those constructs. The research team finalized the instruments through an 
iterative refinement process, grounded in MR’s guiding principles. A team of coders, trained in 
both MR tutoring (by MR expert trainers) and video coding (by the research team), were 
assigned a random selection of twelve tutoring sessions from one student per tutor per cycle (a 
total of 108 students). For purposes of external validation, a subset of tutoring sessions spanning 
a range of indices of implementation fidelity as determined by our coding scheme were sent to 
20 MR experts, who rated the tutoring practices based on their own notions of high-quality MR 
practice. 

Data Collection and Analysis: 

Description of the methods for collecting and analyzing data. 



Guided by the unique aspects of Math Recovery tutoring listed above (i.e., the tutor’s 
ongoing assessment of the child’s thinking and efforts to provide instruction within the child’s 
zone of proximal development), our goal in assessing implementation fidelity was to answer a 
set of key questions regarding tutors’ assessment and instruction: (a) Was the initial assessment 
done? If so, was it done correctly? (b) In instructional lessons, did the tutor choose procedures 
(i.e., sets of related tasks) that were in the child’s zone of proximal development (according to 
the MR Frameworks)? (c) Did the tutor utilize/implement the procedures/tasks well? 

Regarding the first question, we identified two possibilities for breakdown: the tutor 
might have 1) presented the incorrect assessment tasks (or tasks that were misaligned with those 
printed in the assessment), or 2) used poor judgment in interpreting the results (i.e., assigned a 
profile to the student that conflicted with our external assessment of the child’s current 
understanding). For each of these we defined what constituted a minor error, a major error, or 
no error. 

To answer the second question, regarding tutors’ choice of procedures, coders first 
viewed up to three previous tutoring sessions to locate the child’s thinking at that point on the 
MR Learning Framework, and then determined whether the tutor’s choice of procedures matched 
the child’s placement on the MR Learning Framework. That is, did the tutor’s choice of 
procedures align with what the MR Instructional Framework suggested? Often tutors utilized 
procedures as described in the MR handbook, but when they incorporated procedures from other 
sources, coders located those procedures on the Instructional Framework based on the 
procedure’s focus (e.g., arithmetical strategies, number word sequences, etc.), and the level of 
difficulty of the tasks within the procedure, including number range and the extent to which the 
tasks were scaffolded. 

Lastly, to answer the question pertaining to tutors’ implementation of tasks (within 
procedures), coders examined the extent to which tutors followed established “rules” within the 
MR program (e.g., things a tutor is supposed to do, or prohibitions). For example, tutors are 
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expected to consistently solicit students’ strategies for solving problems (if the strategy is not 
already visible), and are expected to avoid merely eliciting particular behaviors. 

After four weeks of refinement work (described above), agreement percentages plateaued 
at an inadequate level — largely due to differences in how coders ‘chunked’ the lessons they were 
coding (e.g., Was it one big task, or two?) Therefore, the evaluation team identified a 
representative aspect of the MR Instructional Framework about which coders’ structural 
decisions had consistently agreed and for which all codes would remain relevant. Of the six 
aspects included in the MR Learning Framework, two of them (Stages of Early Arithmetical 
Learning, and Tens and Ones) represent the heart of the theory underlying the MR program. 
Although lessons typically include practice on other aspects such as number word sequences or 
numeral identification, it is these two aspects that pertain directly to MR’s unique aspects listed 
above. Therefore, video coding focused on instances of activities aimed at supporting students in 
developing more sophisticated strategies, rendering the fidelity assessment process more 
tractable without sacrificing any attention to core implementation components. 

Findings / Results: 

Description of main findings with specific details. 



Throughout the coding process (after the initial refinement phase), coders maintained an 
average percent agreement of 0.80. Furthermore, MR experts’ ratings validated our coding 
schemes, with sufficiently high correlations between their ratings and those based on fidelity 
indices (a = .75, p < .05). 

Conclusions: 

Description of conclusions and recommendations based on findings and overall study. 



Our findings suggest it is possible to create a reliable instrument to measure 
implementation fidelity for differentiated interventions — an endeavor that has, heretofore, been 
largely avoided in evaluations of educational interventions. Many potentially high-quality 
interventions are un-scripted, instead relying on teacher knowledge and professional 
development, requiring considerable differentiation by implementers. As we work to rigorously 
evaluate such programs, we need to develop reliable fidelity measures that are both feasible and 
true to program components, so that evaluators can adequately link measures of treatment 
integrity to outcomes, to more accurately determine the relative strength of interventions 
(Cordray & Pion, 2006). 

Secondly, in coding for instances of “positive infidelity” we have identified “local 
additions” (Blakely et al., 1987) that could possibly strengthen the design of the program. 
Members of the research team have already provided feedback (e.g., at Math Recovery 
practitioner conferences) to challenge developers’ current conceptions and support them in 
improving the program. 

This paper outlines the development and use of a fidelity measure as a case of how such 
instruments might be developed and used in the future. Critical aspects of the process included 1) 
the identification of the core implementation components of the intervention; 2) close work with 
program developers to operationalize those components; 3) training of coders in both the 
program itself and the coding schemes/process; and 4) collaborating with the coding team to 
further refine operationalizations and coding decisions, to strike a balance of feasibility and 
adherence to program components. 
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